AITopics | escape saddle point

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Neural Information Processing SystemsNov-21-2025, 16:17:25 GMT

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points--it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

gradient descent, name change, take exponential time, (2 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, Barnabas Poczos

Neural Information Processing SystemsNov-21-2025, 13:53:59 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, saddle point, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Efficiently escaping saddle points on manifolds

Christopher Criscitiello, Nicolas Boumal

Neural Information Processing SystemsOct-9-2025, 14:33:49 GMT

Smooth, non-convex optimization problems on Riemannian manifolds occur in machine learning as a result of orthonormality, rank or positivity constraints.

critical point, gradient, manifold, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Reviews: SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

Neural Information Processing SystemsJan-26-2025, 10:25:51 GMT

After response----I keep my evaluation on the technical innovation and suboptimality of this paper. The basic Spider and SpiderBoost algorithms are both for first-order stationary point, they are almost the same, and both give n {1/2} rate. The simple way to modify both algorithms to escape saddle point is to add Negative Curvature Search (NCS) subroutine (which can be done in a very modular way, and is already shown in the Spider paper). I'd say it's almost trivial to also show SpiderBoost NCS to find second-order stationary point with n {1/2} rate. Comparing this paper with SpiderBoost NCS, there's no improvement from n {2/3} to n {1/2} (since Spiderboost is already n {1/2}), no simplification of Spider (as Spiderboost already did so). The only difference is replacing NCS by perturbations, which again requires some work, but most techniques are already there.

escape saddle point, escaping saddle point, simple stochastic recursive gradient descent, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Escape saddle points by a simple gradient-descent based algorithm

Neural Information Processing SystemsOct-10-2024, 05:34:05 GMT

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f\colon\mathbb{R} n\to\mathbb{R}, it outputs an \epsilon -approximate second-order stationary point in \tilde{O}(\log n/\epsilon {1.75}) iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in \log n compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

algorithm, escape saddle point, saddle point, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

Reviews: Gradient Descent Can Take Exponential Time to Escape Saddle Points

Neural Information Processing SystemsOct-8-2024, 13:28:15 GMT

It has recently been shown that, when all the saddle points of a non-convex function are "strict saddle", then gradient descent with a (reasonable) random initialization converges to a local minimizer with probability one. For a randomly perturbed version of gradient descent, the convergence rate can additionally be shown to be polynomial in the parameters. This article proves that such a convergence rate does not hold for the non-perturbed version: there exists reasonably smooth functions with only strict saddle points, and natural initialization schemes such that gradient descent requires a number of steps that is exponential in the dimension to find a local minimum. I liked this article very much. It answers a very natural question: gradient descent is an extremely classical, and very simple algorithm.

assumption, equation, gradient descent, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, Barnabas Poczos

Neural Information Processing SystemsOct-4-2024, 10:55:46 GMT

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points--it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

gradient descent, saddle point, stationary point, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Realistic Example in 2 Dimension that Gradient Descent Takes Exponential Time to Escape Saddle Points

Zuo, Shiliang

arXiv.org Machine LearningAug-17-2020

Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.

artificial intelligence, gradient descent, machine learning, (16 more...)

arXiv.org Machine Learning

2008.07513

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Du, Simon S., Jin, Chi, Lee, Jason D., Jordan, Michael I., Singh, Aarti, Poczos, Barnabas

Neural Information Processing SystemsFeb-14-2020, 06:58:26 GMT

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points--it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings. Papers published at the Neural Information Processing Systems Conference.

escape saddle point, gradient descent, take exponential time

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Du, Simon S., Jin, Chi, Lee, Jason D., Jordan, Michael I., Singh, Aarti, Poczos, Barnabas

Neural Information Processing SystemsDec-31-2017

Although gradient descent (GD) almost always escapes saddle points asymptotically [Leeet al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points--it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

artificial intelligence, machine learning, saddle point, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback